Advances and Challenges for Scalable Provenance in Stream Processing Systems
نویسندگان
چکیده
While data provenance is a relatively well-studied topic in both the fields of databases and workflow systems, its support within stream processing systems presents a new set of challenges. Given the potentially high event rate of the input streams and the low processing latency requirements imposed by many streaming applications, capturing data provenance effectively in a stream processing system is extremely challenging. Regardless, emerging streaming applications call for data provenance support (e.g., healthcare analytics applications, financial applications). To illustrate this point, data provenance support has become an essential part of the Century stream processing infrastructure that we are building for supporting online healthcare analytics. At anytime, given an output data element (e.g., a medical alert) generated by the Century system, the system must be able to retrieve all the input and intermediate data elements that led to its generation. In this paper, we first describe the concrete requirements behind our initial implementation of Century’s provenance subsystem. We then analyze the strengths and limitations of our current solution, and propose a new provenance architecture to address some of these observed limitations. The paper also includes a discussion on a set of open challenges and issues that we must resolve.
منابع مشابه
The Case for Fine-Grained Stream Provenance
The current state of the art for provenance in data stream management systems (DSMS) is to provide provenance at a high level of abstraction (such as, from which sensors in a sensor network an aggregated value is derived from). This limitation was imposed by high-throughput requirements and an anticipated lack of application demand for more detailed provenance information. In this work, we firs...
متن کاملDynamic configuration and collaborative scheduling in supply chains based on scalable multi-agent architecture
Due to diversified and frequently changing demands from customers, technological advances and global competition, manufacturers rely on collaboration with their business partners to share costs, risks and expertise. How to take advantage of advancement of technologies to effectively support operations and create competitive advantage is critical for manufacturers to survive. To respond to these...
متن کاملSupporting On-the-fly Provenance Tracking in Stream Processing Systems
A new class of data management systems that operate on highvolume streaming data is becoming increasingly important. As this kind of systems has to process unpredictable streaming data in real-time and deliver instantaneous responses, it becomes very difficult to precisely validate stream processing results in timely manner, verify stream computation that took place and investigate processing s...
متن کاملRecent Advances in Computer Architecture: The Opportunities and Challenges for Provenance
In recent years several hardware and systems fields have made advances in technology that open new opportunities and challenges for provenance systems. In this paper we look at such technologies and discuss the implications they have for provenance. First, we discuss processor and memory controller technologies that enable fine-grained lineage capture, resulting in more precise and accurate pro...
متن کاملApplying Provenance in APT Monitoring and Analysis: Practical Challenges for Scalable, Efficient and Trustworthy Distributed Provenance
Advanced Persistent Threats (APT) are a class of security threats in which a well-resourced attacker targets a specific individual or organisation with a predefined goal. This typically involves exfiltration of confidential material, although increasingly attacks target the encryption or destruction of mission critical data. With traditional prevention and detection mechanisms failing to stem t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008